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Abstract 

Optimal binary labelings, input distributions, and input alphabets are analyzed for the so-called 
bit-interleaved coded modulation (BICM) capacity, paying special attention to the low signal-to-noise 
ratio (SNR) regime. For 8-ary pulse amplitude modulation (PAM) and for 0.75 bit/symbol, the folded 
binary code results in a higher capacity than the binary reflected gray code (BRGC) and the natural 
binary code (NBC). The 1 dB gap between the additive white Gaussian noise (AWGN) capacity and 
the BICM capacity with the BRGC can be almost completely removed if the input symbol distribution 
is properly selected. First-order asymptotics of the BICM capacity for arbitrary input alphabets and 
distributions, dimensions, mean, variance, and binary labeling are developed. These asymptotics are 
used to define first-order optimal (FOO) consteUations for BICM, i.e., constellations that make BICM 
achieve the Shannon limit —1.59 dB. It is shown that the E^/Nq required for reliable transmission at 
asymptotically low rates in BICM can be as high as infinity, that for uniform input distributions and 
8-PAM there are only 72 classes of binary labelings with a different first-order asymptotic behavior, 
and that this number is reduced to only 26 for 8-ary phase shift keying (PSK). A general answer to the 
question of FOO constellations for BICM is also given: using the Hadamard transform, it is found that 
for uniform input distributions, a constellation for BICM is FOO if and only if it is a linear projection 
of a hypercube. A constellation based on PAM or quadrature amplitude modulation input alphabets is 
FOO if and only if they are labeled by the NBC; if the constellation is based on PSK input alphabets 
instead, it can never be FOO if the input alphabet has more than four points, regardless of the labeling. 
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I. Introduction 

The problem of reliable transmission of digital information through a noisy channel dates 
back to the works of Nyquist [[II, [[3 and Hartley ^ almost 90 years ago. Their efforts were 
capitalized by C. E. Shannon who formulated a unified mathematical theory of communication in 
1948 [in, lISllH After he introduced the famous capacity formula for the additive white Gaussian 
noise (AWGN) channel, the problem of designing a system that operates close to that limit 
has been one of the most important and challenging problems in information/communication 
theory. While low spectral efficiencies can be obtained by combining binary signaling and a 
channel encoder, high spectral efficiencies are usually obtained by using a coded modulation 
(CM) scheme based on a multilevel modulator. 

In 1974, Massey proposed the idea of jointly designing the channel encoder and modulator fP\, 
which inspired Ungerboeck's trellis-coded modulation (TCM) [[H, [[H and Imai and Hirakawa's 
multilevel coding (MLC) [[lOl, [[IB- Since both TCM and MLC aim to maximize a Euclidean 
distance measure, they perform very well over the AWGN channel. However, their performance 
over fading channels is rather poor. The next breakthrough came in 1992, when Zehavi introduced 
the so-called bit-interleaved coded modulation (BICM) (T2\ (later analyzed in (13]), which is a 
serial concatenation of a binary channel encoder, a bit-level interleaver, and a memoryless mapper. 
BICM aims to increase the code diversity — the key performance measure in fading channels — 
and therefore outperforms TCM in this scenario [[T3l Table III]. BICM is very attractive from an 
implementation point view because of its flexibility, i.e., the channel encoder and the modulator 
can be selected independently, somehow breaking Massey's joint design paradigm. BICM is 
nowadays a de facto standard, and it is used in most of the existing wireless systems, e.g., HSPA 
(HSDPA and HSUPA) [[H [ESI Ch. 12], IEEE 802.1 la/g W IEEE 802.1 In ^ Sec. 20.3.3], 
and the latest DVB standards (DVB-T2 [181, DVB-S2 [[HI, and DVB-C2 

Plots of the BICM capacity vs. E^/Nq reveal that BICM does not always achieve the Shannon 
limit (SL) —1.59 dB. This can be explained based on first-order asymptotics of the BICM 
capacity, which were recently developed by Martinez et al. for uniform input distributions and 
one- and two-dimensional input alphabets [[211 . [[22l . It was shown that there is a bounded 
loss between the BICM capacity and the SL when pulse amplitude modulation (PAM) input 

'An excellent summary of the contributions that influenced Shannon's work can be found in |[6] Sec. 1]. 
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alphabets labeled by the binary reflected gray code (BRGC) is used. Recently, Stierstorfer and 
Fischer showed in ll23l that this is caused by the selection of the binary labeling and that equally 
spaced R\M and quadrature amplitude modulation (QAM) input alphabets with uniform input 
distributions labeled by the natural binary code (NBC) achieve the SL. Moreover, the same 
authors showed in [|24l that for low to medium signal-to-noise ratios (SNR), the NBC results 
in a higher capacity than the BRGC for FAM and QAM input alphabets and uniform input 
distributions. 

The fact that the BICM capacity does not always achieve the SL raises the fundamental 
question about first-order optimal (FOO) constellations for BICM, i.e., constellations that make 
the BICM achieve the SL. In this paper, we generalize the first-order asymptotics of the BICM 
capacity presented in ll2Tll to input alphabets with arbitrary dimensions, input distributions, mean, 
variance, and binary labelings. Based on this model, we present asymptotic results for R\M and 
phase shift keying (PSK) input alphabets with uniform input distribution and different binary 
labelings. Our analysis is based on the so-called Hadamard transform [|25l pp. 53-54], which 
allows us to fully characterize FOO constellations for BICM with uniform input distributions for 
fading and nonfading channels. A complete answer to the question about FOO constellations for 
BICM with uniform input distributions is given: a constellation is FOO if and only if it is a linear 
projection of a hypercube. Furthermore, binary labelings for the traditional input alphabets R\M, 
QAM, and PSK are studied. In particular, it is proven that for PAM and QAM input alphabets, 
the NBC is the only binary labeling that results in an FOO constellation. It is also proven that 
PSK input alphabets with more than four points can never yield an FOO constellation, regardless 
of the binary labeling. When 8-PAM with a uniform input distribution is considered, the folded 
binary code (FBC) results in a higher capacity than the BRGC and the NBC. Moreover, it is 
shown how the BICM capacity can be increased by properly selecting the input distribution, 
i.e., by using so-called probabilistic shaping ll26l . In particular, probabilistic shaping is used to 
show that PAM input alphabets labeled by the BRGC or the FBC can also be FOO, and to show 
that the 1 dB gap between the AWGN capacity and the BICM capacity with the BRGC can be 
almost completely removed. 
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II. Preliminaries 

A. Notation Convention 

Hereafter we use lowercase letters x to denote a scalar, boldface letters x to denote a row 
vector of scalars, and underlined symbols x to denote a sequence. Blackboard bold letters X 
represent matrices and Xj j represents the entry of X at row i, column j, where all the indices 
start at zero. The transpose of X is denoted by X^, trace (X) is the trace of X, and ||X|p is 
trace (X^X). 

We denote random variables by capital letters Y, probabilities by Pr{-}, the probability 
mass function (pmf) of the random vector Y by PY{y), and the probability density function 
(pdf) of the random vector Y by PYiv)- The joint pdf of the random vectors X and Y 
is denoted by and the conditional pdf of Y conditioned on X = a; is denoted 

by PY\x=x{y)- The same notation applies to joint and conditional pmfs, i.e., Pxxi^^v) 
PY\x=x{y)- The expectation of an arbitrary function f{X,Y) over the joint pdf of X and Y 
is denoted by ¥jxx[f{X,Y)], the expectation over the conditional pdf PY\x=x{y) is denoted 
by 'EY\x=x[f{X, Y)], and cov (X) is the covariance matrix of the random vector X. 

We denote the base-2 representation of the integer < i < M — 1, where M = 2™, by the 
vector b{i) = [bm-i{i),bm~2{i), ■ ■ ■ ,bo{i)], where bm~i{i) is the most significant bit of i and 
bo{i) the least significant. To facilitate some of the developments in this paper, we also define 
the ordered direct product as 

K, . . . , aJ„J^ ® [6J, . . . , hl,]^ ^ [cj, . . . , cl_,]\ (1) 

where Cgj+j = [oj, hj] for z = 0, . . . , j» — 1 and j = 0, . . . , g — 1. The ordered direct product in 
([T]) is analogous to the Cartesian product except that it operates on vectors/matrices instead of 
sets. 

B. Binary Labelings 

A binary labeling L of order m > 1 is defined using an M x m matrix where each row 
corresponds to one of the M length-m distinct binary codewords, L = [cj, . . . , c|f_i]^, where 

Ci = [Q,0, Ci,l, • • • , G {0, 1}"". 
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In order to recursively define some particular binary labelings, we first define expansions, repe- 
titions, and reflections of binary labelings. To expand a labeling = [cq , . . . , c^/„i]^ into a la- 
beling hrn+i, we repeat each binary codeword once to obtain a new matrix [cj , Cq , . . . , cjj^-^^, cjj^-^]^, 
and then we obtain L^+i by appending one extra column [0, 1, 1, 0, 0, 1, 1, 0, . . . , 0, 1, 1, 0]^ 
of length 2M ETl . To generate a labeling hm+i from a labeling = [cq , . . . , cjj_^]^ by 
repetition, we repeat the labeling once to obtain a new matrix [cj, . . . , cjj_^, Cq , . . . , cjj_^]^, 
and we add an extra column from the left, consisting of M zeros followed by M ones. Finally, 
to generate a labeling L„i+i from a labeling L,„ = [cg , . . . , cjj_i]^ by reflection, we join 
and a reversed version of L„j to obtain a new matrix [cq , . . . , cjj_^, cjj„^, . . . , Cq ]^, and we 
add an extra column from the left, consisting of M zeros followed by M ones [|27l . 

In this paper we are particularly interested in the binary reflected Gray code (BRGC) [|28l . 
l|29l , the natural binary code (NBC), and the folded binary code (FBC) fJOj. The FBC was 
analyzed in [[30ll for uncoded transmission and here we will, to our knowledge for the first time, 
consider it for coded transmission. In Sec. IIII-DI and Sec. IV-CI it is shown to yield a higher 
capacity than other labelings under some conditions. We also introduce a new binary labeling 
denoted binary semi-Gray code (BSGC). These binary labelings are generated as follows: 

• The BRGC Gm of order m > 1 is generated by m — 1 recursive expansions of the trivial 
labeling Li = [0, 1]^, or, alternatively, by m — 1 recursive reflections of Li. 

• The NBC of order m > 1 is defined as the codewords q that are the base-2 represen- 
tations of the integers i = 0, . . . , M - 1, i.e., = [6(0)'^, . . . , b{M - 1)"^]^. Alternatively, 
Nm can be generated by m — 1 recursive repetitions of the trivial labeling Li, or as m — 1 
ordered direct products of Li with itself. 

• The BSGC of order m > 3 is generated by replacing the first column of Gm by the 
modulo-2 sum of the first and last columns. 

• The FBC of order m > 2 is generated by one reflection of N„i-i- 

For any labeling matrix L = [cj, . . . , cj^.i]^, where Cj = [q,o, . . . , G {0, 1}"", we 

define a modifled labeling matrix Q = Q(L) which is obtained by reversing the order of the 
columns and applying the mapping (0 — t- 1, 1 — )■ —1), i.e., 

^ , -1, if Cj,m-l-fc = 1 

qi,k = { (2) 

'1; if Cj_m-l-fc = 
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with i = 0, . . . , M - 1 and A; = 0, . . . , m - 1. 
Example 1 (Binary labelings of order m = 3): 
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C. Constellations and Input Distributions 

Throughout this paper, we use X to represent the set of symbols used for transmission. Each 
element of X is an A^-dimensional symbol Xi, i = 0, . . . , M — 1, where \X\ = M = 2™ and 
X C R^. We define the input alphabet using an M x m matrix X = [xq, . . . , xjj^-^]"^ which 
contains all the elements of X. 

For practical reasons, we are interested in well structured input alphabets. An M-PAM input 
alphabet is defined by the column vector Xpam where Xji = — (M— 2i — 1) with i = 0,...,M— 1. 



An M-PSK input alphabet is the matrix XpsK where Xi 



cos 



(2t+l)7r 
M 



, sm 



(2^+l)7^ 
M 



with i 



0,1, 



M - 1. Finally, a rectangular (M' x M")-QAM input alphabet is the M'M" x 2 matrix 
Xqam = XpAM ® -^PAM' where Xp^j^ and Xp^j^ are vectors of length M' and M", respectively. 
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For a given input alphabet X, the input distribution of the symbols is denoted by the pmf 
Px{x), which represents the probabilities of transmitting the symbols x, i.e., Pr{X = x}. We 
define the matrix P as an ordered list containing the probabilities of the symbols, i.e., P = 
[Pxixo), . . . , PxixM-i)]'^- We use Um = . . . , 1/M]^ to denote the discrete uniform 

input distribution. 

We define a constellation as the list of matrices f2 = [X, L, P], i.e., an input alphabet using 
a given labeling and input distribution. Finally, for a given pair [X, L], we denote with Xfc „ C 
{0, . . . , M — 1} the set of indexes of the symbols with a binary label u G {0, 1} at bit position 
k e {0, . . . ,m - 1}, i.e., Xk,u = G {0, . . . , M - 1} : Q,fc = u}. 

D. System Model 

In this paper, we analyze coded modulation schemes (CM) as the one shown in Fig.[T] Each of 
the K possible messages is represented by the binary vector w E {0, 1}^'% where k^ = log2 K- 
The transmitter maps each message to a sequence x = [a;(0)^, . . . , x{Ns — 1)^]^ G which 
corresponds to iVg A^-dimensional symbols (A's channel useo- The code C is a subset of X^" 
such that |C| = K, which is used for transmission. The transmitter is then defined as a one-to-one 

function that assigns each information message w to one of the K possible sequences x G C. 

kc 

The code rate in information bits per coded bits is then given by R = or, equivalently, 

Rc = — information bits per channel use (information bits per symbol, or information bits 
per real dimensions). At the receiver's side, based on the channel observations, a maximum 
likelihood sequence receiver generates an estimate of the information bits w selecting the most 
likely transmitted message. 

We consider transmissions over a discrete-time memoryless fast fading channel 

Y{t) = H{t)oX{t) + Z{t), (3) 

where the operator o denotes the so-called Schur product (element-wise product) between two 
vectors, X{t), H{t), Y(t), and Z(t) are the underlying random vectors for x(t), h(t), y{t), and 
z{t) respectively, with t = 0, . . . , A'g — 1 being the discrete time index, and Z(t) is a Gaussian 

"channel use" corresponds to the transmission of one A'^-dimensional symbol, i.e., it can be considered as a "vectorial 
chaimel use". 
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BICM Channel 




Figure 1. A CM system based on a BICM structure: A binary channel encoder, a bit-level interleaver, a memoryless mapper, 
the fading channel, and the inverse processes at the receiver side. 



noise with zero mean and variance iVo/2 in each dimension. The channel is represented by the 

A/^-dimensional vector H{t), and it contains real fading coefficients Hi which are assumed to be 

random variables, possibly dependent, with same pdf pnih). We assume that H{t) and A^o are 

perfectly known at the receiver or can be perfectly estimated. Since the channel is memoryless, 

from now on we drop the index t. 

The conditional transition pdf of the channel in ([3]) is given by 

, , 1 / \\y — hoxP\ 
PYix^.,H.M = exp J . (4) 

We assume that both H and X have finite and nonzero second moments, that X, H, and Z 
are mutually independent, and that there exists a constant u; > such that for all sufficiently 
large A > the vector H satisfies 

Pr{||iff > A} < exp(-A'^). (5) 

This condition will be used in the proof of Theorem |7] in Sec. IIV-CI 

Each transmitted symbol conveys Rc information bits and thus, the relation between the 
average symbol energy Eg = Ex[||^P] and the average information bit energy is given by 
Es = RcEy,. We define the average signal-to-noise ratio (SNR) as 

A ^H,x[\\H oXr] ^ ^^j^,^^ ^ E.mn^^. (6) 
i\q iVo iVo 

The AWGN channel is obtained as a special case of ^ by taking H as the all-one vector. An- 
other particular case is obtained when Hq = Hi = . . . = H^^i = A, which particularizes to the 
Rayleigh fading channel when A = Af + A^ and Ai, A2 are independent zero-mean Gaussian 
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random variables. In this case, the instantaneous SNR defined by Ex[||-ffoX||^]/Ai'o = A^E^/N^ 
follows a chi-square distribution with one degree of freedom (an exponential distribution). Sim- 
ilarly, the Nakagami-m fading channel is obtained when A follows a Nakagami-m distribution. 
It can be shown that the condition ([5]) is fulfilled in all the cases above. 

In a BICM system [[T2l . [[T3l . the transmitter in Fig. [T]is realized using a serial concatenation 
of a binary encoder of rate R = Rc/m, a bit level interleaver, and a memoryless mapper $. 
The mapper $ is defined as a one-to-one mapping rule that maps the length-m binary random 
vector C = [Co, . . . , Cm-i] to one symbol X, i.e., $ : {0, l}™ — j- X. At the receiver's side, the 
demapper computes soft information on the coded bits, which are then deinterleaved and passed 
to the channel decoder. The a posteriori L-values for the A;th bit in the symbol and for a given 
fading realization are given by 

, . .A, PT{Y = y\Ck = l,H = h} 

No ^ , iGXfc.u 
«G{0,1} 

where to pass from ([8]) to dll), the so-called max-log [|3T1| approximation was used. 

The max-log metric in ^ (already proposed in lfT2]| . [fT3l ) is suboptimal; however, it is very 
popular in practical implementations because of its low complexity, e.g., in the 3rd generation 
partnership project (3GPP) working groups [[32ll . It is also known that when Gray-labeled 
constellations are used, the use of this simplification results in a negligible impact on the 
receiver's performance ll33l Fig. 9] ll34l Fig. 6]. The max-log approximation also allows BICM 
implementations which do not require the knowledge of N^, for example, when a Viterbi decoder 
is used, or when the demapper passes hard decisions to the decoder. Moreover, the use of the max- 
log approximation transforms the nonlinear relation in ([8]) into a piecewise linear relation. 
This has been used to develop expressions for the pdf of the L-values in ^ using arbitrary 
input alphabets [35| (based on an algorithmic approach), closed-form expressions for QAM 
input alphabets labeled by the BRGC for the AWGN channel [l36l|, |l37l, and for fading channels 
[|38l . Recently, closed-form approximations for the pdf of the L-values in ^ for arbitrary input 
alphabets and binary labeling in fading channels have been presented [[39l . 
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E. The Hadamard Transform 

The Hadamard transform (HT) is a discrete, linear, orthogonal transform, like for example the 
Fourier transform, but its coefficients take values in ±1 only. Among the different applications 
that the HT has, one that is often overlooked is as an analysis tool for binary labelings ll40ll . 

The HT is defined by means of an M x M matrix, the Hadamard matrix, which is defined 
recursively as follows when M is a power of two |l25l pp. 53-54]. 

Hm — EIm 



Hi = 1 



EI2A/ 

Example 2 (Hadamard matrix Elgj.- 



M > 1 
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(10) 



In the following, we will drop the index, letting EI represent a Hadamard matrix of any size 
M = 2"^. Hadamard matrices have the following appealing properties. 



M 



-e. 



(11) 



It can be shown [i42i Sec. 1.1] Il43l Sec. Ill] that the elements of a Hadamard matrix are h. 
YYk=o ' from which we observe for future use that for all i = 0,. 

/ = 0, . . . , m — 1, 



M - 1 and 



m—l 



h. 



i,0 



hi 



2' 



n(-i) 



(12) 



fc=0 



where hi{i) is the /th bit of the base-2 representation of the integer i. 

At this point it is interesting to note the close relation between the columns of the matrix 
Q(N3) in Example [H and the columns 2' of Hg in (flOl) for / = 0, 1, 2. Its generalization is given 
by the following lemma, whose proof follows immediately from (|2l), the definition of the NBC 
in Sec. |ITBl and (HH). 
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Lemma 1: Let Q = Q(Nm) be the modified labeling matrix for the NBC of order m, and let 
EI be the Hadamard matrix. For any m, and for /c = 0, . . . , m — 1 and z = 0, . . . , M — 1, 

Qi,k = hi,2''- (13) 

The HT operates on a vector of length M = 2", for any integer m, or in a more general case, 
on a matrix with M = 2"^ rows. The transform of a matrix X is denoted X and has the same 
dimensions as X. It is defined as 

X = —ex (14) 

and the inverse transform is X = HX. Equivalently, 

^ M-l 

~ hj^iXi, Xi = hijXj, (15) 

where from (fTTI) we have that /ij ^ = hij, and where we have introduced the row vectors Xi and 
Xj such that 

p rp r|-i -1 ~" r ~ 'X^ T ~l 

^ = [^O ) • • • 5 ^M-l\ ) = [^O 5 ■ • ■ 5 ^M-lJ • 

Because of (fT2l) . the first element of the transform is simply = jr X]f=o ^ 



A/ A^i=0 

11;^ 112 



Finally, using J2j=o ll^ill^ ^ trace(X^X), (fT4l) . and (fTTI) . we note that a variant of Parseval's 
theorem holds: 

M-l J\/-l 
j=0 i=0 

III. Capacity of Coded Modulation Systems 

In this section we analyze the capacity of CM schemes, i.e., the so-called CM and BICM 
capacities. We review their relation and we analyze how the selection of the constellation 
influences them. We pay special attention to the selection of the binary labeling and the use 
of probabilistic shaping for BICM. 
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A. AMI and Channel Capacity 

In this subsection, we assume the use of a continuous input alphabet, i.e., X = R^, which 
upperbounds the performance of finite input alphabets. 

The average mutual information (AMI) in bitj^ per channel use between the random vectors 
X and Y when the channel is perfectly known at the receiver is defined as 



E 



log; 

log2 



'py{Y)px{X) 

Py\x{Y) 
Py{Y) 



(17) 
(18) 



where we use X as the index of Ix{X] Y) to emphasize the fact that the AMI depends on the 
input PDF px{x). For an arbitrary channel parameter H, the AMI in (fTT]) can be expressed ac 

Py\x,h{^ 



Ix{X] Y) = ^x.Y,H 



logs 



Py\h[ 



(19) 



where PY\x=x,H=h{y) is given by ©. 

The channel capacity of a continuous-input continuous-output memoryless channel is defined 
as the maximum AMI between its input and output |441 Ch. 4] ll45l eq. (3)] 



C(SNR) = maxIx{X;Y) 

Px(x) 



(20) 



where the maximization is over all possible input distributions. The capacity in (1201) has units 
of [bit/channel use] (or equivalently [bit/symbol]), and it is an upper bound on the number of 
bits per symbol that can be reliably transmitted through the channel, where a symbol consists of 
real dimensions. Shannon's channel coding theorem states that it is not possible to transmit 
information reliably above this fundamental limit, i.e.. 



R,<C{SMR) = C[R,Eh[H^]^ 



(21) 



The AWGN capacity, denoted by C^^ (SNR), is defined as the channel capacity of the AWGN 
channel (obtained from ^ using H{t) = 1), and it is given by [|44l Sec. 9.4] 



CAW(SNR) = |log2 (l + ^SNR 



(22) 



^Throughout this paper all the AMIs are given in bits. 

■^We note that the AMI with perfect channel state information is usually denoted by Ix{X; Y\H), however, and for notation 



simplicity, we use Ix{X;Y). 
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This capacity is attained when X are i.i.d. zero-mean Gaussian random variables with variance 
Es/N in each dimension and it follows from the fact that the noise is independent in each 
dimension, and thus, the transmission of X can be considered as a transmission through N 
parallel independent Gaussian channels. 

We define the conditional AMI for discrete input alphabets as the AMI between X and Y 
conditioned on the outcome of a third random variable U, i.e., 

PY\X,U=uiY 



Ix\u=u{X;Y) — 'Ex,Y\u=u 



log 



E 



'X,Y,H\U=u 



' PY\U=uiY) 

Py\x,h,u=u{Y) 



log. 



PY\H,U=u[ 



(23) 
(24) 



which is valid for any random H. 



B. CM Capacity 

The CM capacity is defined as the AMI between X and Y for a given constellation Vt, i.e., 

|C^(SNR)^/x(X;r) (25) 
= Ix{C-Y) (26) 

m— 1 

= 5^/x(Cfe;r|Co,...,Cfc_i), (27) 

fc=0 

where to pass from (l25l) to (|26|) . we used the fact that the mapping rule between C and X 
is one-to-one. To pass from (l26l) to (l27l) we have used the chain rule of mutual information 
Il44l Sec. 2.5], where IxiCk', Y\Co, . . . , Cfe-i) represents a bit level AMI which represents the 
maximum rate that can be used at the (k + l)th bit position, given a perfect knowledge of the 
previous k bits. 

The CM capacity in (l25l) corresponds to the capacity of the memoryless "CM channel" in Fig.[T] 
for a given constellation f2. We note that different binary labelings will produce different values 
of Ix{Ck] Y\Cq, . . . , Cfc_i) in (ITtI) : however, the overall sum will remain constant, i.e., the CM 
capacity does not depend on the binary labeling. We use the name "CM capacity" for 1^'^ (SNR) 
in (125]) following the standard terminologj] used in the literature (cf. ifBll. lUll. Il23]l. [l48ll. Il49l). 
although we recognize the misusage of the word capacity since no optimization over the input 

^Sometimes, this is also called joint capacity 1261 . or (constellation) constrained capacity 1461 , 1471 . 
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distribution is performed (cf. (|20l)). Moreover, it is also possible to optimize the input alphabet in 
order to obtain an increase in the AMI (so-called signal shaping ll50l ). Nevertheless, throughout 
this paper we will refer to the AMI for a given f2 in (|25] ) as the CM capacity. 

In this paper we are interested in optimal constellations, and therefore, we define the maximum 
CM capacity as 

C^^ (SNR) ^ max 1^^ (SNR) (28) 

771—1 

= max V Ix{Cu- r |Co, . . . , C^.i). (29) 

k=0 

As mentioned before, the CM capacity does not depend on the binary labeling, i.e., it does not 
depend on how the mapping rule $ is implemented, and therefore, in (|29l) we only show two 
optimization parameters: the input alphabet and the input distribution. 

The CM capacity in (l25l) (for a given constellation Vt) is an upper bound on the num- 
ber of bits per symbol that can be reliably transmitted using for example TCM [9| or MLC 
with multistage decoding (MLC-MSD) [[TOl, JSH- MLC-MSD is in fact a direct application 
of the summation in (l27l) . i.e., m parallel encoders are used, each of them having a rate 
Rk = IxiCk;Y\Co, . . . ,Ck~i). At the receiver's side, the first bit level is decoded and the 
decisions are passed to the second decoder, which then passes the decisions to the third decoder, 
and so on. Other design rules can also be applied in MLC, cf. [[STI . The maximum CM capacity 
qCM (^si\|r) in (|29l) represents an upper bound on the number of bits per symbol that can be 
reliably transmitted using a fully optimized system, i.e., a system where for each SNR value 
SNR, the input alphabet and the input distribution are selected in order to maximize the CM 
capacity Ig^(SNR). 

C. BICM with Arbitrary Input Distributions 

It is commonly assumed that the sequence generated by the binary encoder in Fig. [His infinitely 
long and symmetric, and also that the interleaver (tt) operates over this infinite sequence, simply 
permuting it in a random way. Under these standard assumptions, it follows that the input symbol 
distribution will be always P = Uj\/. Since in this paper we are interested in analyzing a more 
general setup where the input symbol distribution can by modified, we develop a more general 
model in which we relax the equiprobable input distribution assumption. 
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Let Ck G {0, 1} the binary random variable representing the bits at the /cth modulator's 
input, where the pmf Pcki^) represents the probability of transmitting a bit u at bit position 
k. We assume that in general J21^=o -^Cki^) YlT=o -^c'ki^)^ i-^-? the coded and interleaved 
sequence could have more zeros than ones (or vice- versa). Note that since Pc^.{u) is a pmf, 

Pc.(0) + Pc.(l) = l. 

Let Cj = [cj 0, • • • , Ci^rn-i] be the binary label of the symbol Xi. We assume that the bits at the 
input of the modulator are independent, and therefore, the input symbol probabilities are 



m—l 



(30) 



fc=0 



The independence condition on the coded bits that results in (1301) can be obtained if the 
interleaver block in Fig. [T] completely breaks the temporal correlation of the coded bits. The 
condition that the coded and interleaved sequence could be asymmetric can be obtained for 
example by using an encoder with nonuniform outputs, or by a particular puncturing scheme 
applied to the coded bits. This can be combined with the use of multiple interleavers and 
multiplexing [[52ll . which would allow Pc^i'^) 7^ 1/2- Examples of how to construct a BICM 
scheme where nonuniform input symbol distributions are obtained include the "shaping encoder" 
of ll53l . Il54ll and the nonuniform signaling scheme based on a Huffman code of ll55l . 

For future use, we also define the conditional input symbol probabilities, conditioned on the 
/cth bit being u, as 



Px\Cu=u{Xi) 



( m-1 



if Ci 



i,k 



U 



k'=0 
k'^k 

0, 

Px{x,) 
Pc,{u) 

0, 



if Ci^k 7^ u 

if i G Xfc,„ 
if i ^ Ik,u 



(31) 



where ^ is defined in Sec. IH-CI 



D. BICM Capacity 

The "BICM channel" in Fig. \T\ was introduced in Il56l and it is what separates the encoder 
and decoder in a BICM system. The BICM capacity is then defined as the capacity of the BICM 
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channel. Using the definitions in Sec. IIII-CI and the equivalent channel model in |fT3l Fig. 6], 
which replaces the BICM channel by m parallel binary-input continuous-output channels, the 
BICM capacity for a given constellation f2 is defined as 



m—1 



lH^(SNR)4^/c,(C.;r) (32) 

PY\Ck,H{Y 



fc=0 
m—1 

log2 

fe=0 
m—1 

k=o «e{o,i} 

m—1 



Py\h{Y) _ 

log; 



PY\H,C\=uiY) 

' PyMY) 



(33) 
(34) 



n lit J. n 

= PH{h)J2 E ^ Px{Xi) I PY\X=^,,H=h{y)- 

, P^)T.,ex,.^Px{x^)pY\x=.,,H=H{y) 

where (l35l) follows from (|34|) . (|3TI) . and the fact that the value of Ck does not affect the conditional 
channel transition probability, i.e., PY\x=x,H=h,Ck=u{y) = PY\x=x,H=h{y)- The BICM capacity 
in (|35l) is a general expression that depends on all the constellation parameters Vt. This can be 
numerically implemented using Gauss-Hermite quadratures, or alternatively, by using a one- 
dimensional integration based on the pdf of the L-values developed in ll35l . [I37l - [l39l . 

The AMIs Ic^{Ck]Y) in (|32|) are, in contrast to the ones in (|29l) . not conditioned on the 
previous bit values. Because of this, and unlike the CM capacity, the binary labeling strongly 
affects the BICM capacity l^^(SNR) in Note that the BICM capacity is equivalent to 

the capacity achieved by MLC with (suboptimal) parallel decoding of the individual bit levels, 
because in BICM, the bits are treated as independent [|5T1 . The differences are that BICM uses 
only one encoder, and that in BICM the equivalent channels are not used in parallel, but time 
multiplexed. Again, following the standard terminolog>@ used in the literature (cf. lfT3l . [[2T]|. 
Il23l . [|48l . (491), we use the name "BICM capacity" even though no optimization over the input 
distribution is performed. 

If all the bits at the input of the modulator are equally likely, i.e., Pck{u) = 1/2 for k = 
0, . . . , m — 1 and u G {0, 1}, we obtain from (l30l) Px{x) = 1/M. Under these constraints, and 

*It is also called parallel decoding capacity in (26|, or receiver constrained capacity in 1461 . 
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assuming an AWGN channel {H = 1), the BICM capacity in (|35l) is given by 

where the constellation is 1] = [X, L, Ujv/]. This expression coincides with the "standard" BICM 
capacity formula (cf. [Ell Sec. 3.2.1], [SI eq. (15)], [|48l eq. (11)]). 

One relevant question here is what is the optimum labeling from a capacity maximization 
point of view. Once this question is answered, approaching the fundamental limit will depend 
only on a good design of the channel encoder/decoder. Caire et al. conjectured the optimality of 
the BRGC, which, as the next example shows, is not correct at all SNR. This was first disproved 
in [|24| for PAM input alphabets based on an exhaustive search of binary labelings up to M = 8. 

Example 3 (CM and BICM Capacities for the AWGN channel, 8-PAM, and Ugj.' In Fig. [21 we 
show the BICM capacity in ([36l) and the CM capacity in ([25]) for 8-PAM, P = Us, and the four 
binary labelings in Example \T\ Fig. [2] (a) illustrates that the difference between the CM capacity 
and the BICM capacity is small if the binary labeling is properly selected. The best of the four 
binary labelings is the NBC for low SNR (R^ < 0.43 bit/symbol), the FBC for medium SNR 
(0.43 < Rc < 1.09 bit/symbol), and the BRGC for high SNR (R^ > 1.09 bit/symbol). Hence, 
the BRGC is suboptimal in at least 36% of the -Rc range. The gap between the CM capacity 
and the BICM capacity for the BSGC is quite large at low to moderate SNR. The low-SNR 
behavior is better elucidated in Fig. [2] (b), where the same capacity curves are plotted versus 
-Eb/iVo instead of SNR. Interestingly, the CM capacity and the BICM capacity using the NBC 
achieve the SL at asymptotically low rates; Gaussian inputs are not necessary, cf. jSS, Sec. I]. 

Formally, E\,/Nq is bounded from below by f{Rc), where 

f{Rc) ^ ■ (37) 



Eh\H^]Rc' 

This function always exists, because the capacitjO C(SNR) is a strictly increasingj function of 
SNR and thus invertible, while in contrast /(-Rc) is in general not monotone. This is the reason 

^From now on we will refer to "capacity" using the notation C(SNR) in a broad sense. C(SNR) can be the AWGN capacity 
^^^(SNR) in l(22}, the CM capacity iH^^(SNR) in idUl, or the BICM capacity iB'(SNR) in 

^This can be proved using the relation between the AMI and the minimum mean square error (MMSE) presented in 1591 , 
i.e., that the derivative of the AMI with respect to SNR is proportional to the MMSE for any SNR. Since the MMSE is a strictly 
decreasing function of SNR, the AMI is a strictly increasing function of SNR. 
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Figure 2. CM capacity and BICM capacity for 8-PAM with Ug using the four labehngs defined in Sec. III-B] plotted vs. SNR 
(a) and Eh/No (b), and their corresponding functions g(Rc) (c). The shadowed regions represent the achievable rates using the 
BSGC. The black squares represent the minimum Eh /No for the BSGC. The black circles represents the Eh /No needed for a 
rate ii = 1/15 turbo code to reach BER = 10'^ (cf. Sec lyFAt . 
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why a given E^^/Nq for some labelings maps to more than one capacity value, as shown in 
[|2n . The phenomenon can be understood by considering the function l[^^(SNR) in a linear SNR 
scale, instead of logarithmic as in Fig. |2] (a). If plotted, the function would pass through the 
origin for all labelings. Furthermore, any straight line through the origin represents a constant 
'Eh[H'^]E\^/No by Q, where the slope is determined by the value of 'Eh[H'^]E]JNq. Such a line 
cannot intersect l[^^(SNR) more than once for SNR > 0, if Iq^(SNR) is concave. This is the case 
for the BRGC, NBC, and FBC, and therefore the function = f~^{EH[H'^]Ei,/No) exists, as 
illustrated for in Fig. [2] (b). However, for some labelings such as the BSGC (and many others 
shown in ll49l Fig. 3.5]), l^^(SNR) is not concave and f{Rc) is not invertible. This phenomenon 
has also been observed for linear precoding for BICM with iterative demapping and decoding 
ll60l Fig. 3], punctured turbo codes [611 Fig. 3], and incoherent M-ary PSK |l62l Figs. 2 and 5] 
and frequency-shift keying channels ll63l Figs. 1 and 6]. 

Since analytical expressions for the inverse function of the capacity are usually not available, 
expressions for f{Rc) are rare in the literature. One well-known exception is the capacity of the 
Gaussian channel given by (|22l) . for which 

r^TO = :^(2^^^/^-l), (38) 

which results in the SL 

^lim^ /^^(i?c) = loge(2) = -1.59 dB. (39) 

Analogously, we will use the notation /^^(i?c) and /^^{Rc) when the capacity considered is 
the CM and the BICM capacity, respectively!^ 

The results in Fig. |2] (a)-(b) suggest a more general question: What are the optimal constel- 
lations for BICM at a given SNR? To formalize this question, and in analogy to the maximum 
CM capacity in (|28] ), we define the maximum BICM capacity as 

C^^ (SNR) ^ max 1^^ (SNR) , (40) 

where the optimization is in this case over the three parameters defining Vt. In analogy to the 
maximum CM capacity, the maximum BICM capacity represents an upper bound on the number 

'The same notation convention will be used for other functions that will be introduced later in the paper. 
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of bits per symbol that can be reliably transmitted using a fully optimized BICM system, i.e., a 
system where for each SNR, the constellation is selected to maximize the BICM capacity. 

We conclude this subsection by expressing the BICM capacity as a difference of AMIs and 
conditional AMIs , which will facilitate the analysis in SecHVl The following result is a somehow 
straightforward generalization of ETl Proposition 1], [j47l eq. (65)] to A^-dimensional input 
alphabets, nonuniform input distributions, and fading channels. 

Theorem 2: The BICM capacity can be expressed as 

m— 1 

l^i(SNR) = 5^ Pc,{u)[lx{X;Y)-Ix\c,=uiX-Y)]. (41) 



fc=o ue{o,i} 
Proof: For any function e{X,Y, H), 



'^H,Y\Ck=u 



, VY\HCk=uiX) 
logs 7T7^ 

Py\h{Y) 



'^X,Y,H\Ck=u 



loga log2 



PY\h{Y) PY\H,Ck=u{Y) 



Using this relation in dMl), letting e{X,Y,H) = Py\x,h{Y) = PY\x,H,Ck=u{Y), observing 
that the first term is independent of u, and utilizing (fT9l) and (|24|) yields the theorem. □ 

E. Minimum E^^/Nq for Reliable Transmission 

In this section, we determine the minimum E^/No that permits reliable transmission, for a 
given input alphabet and labeling. As observed in Fig. [2] (b), this minimum does not necessarily 
occur at rate -Rc = 0. 

Theorem 3 (Minimum E\^/Nq): The minimum E\,/Nq is given by f{Rc), where Rc = or Rc 

is one of the solutions of g{Rc) = 0, where 

g(R^) A ^ 1 dC-\R,) _ C~\R,) ^ ^^2) 

dRc Rc dRc R^ 

Proof: Any smooth function has a minimum given by the solution of its first derivative 

equal to zero or at the extremes of the considered interval. Since in general < -Rc < C)0, two 

extreme cases should be considered. However, lim/j^^oo /^^(-Rc) = ^^^Rc^m- fn{Rc) = oo, 

and therefore, the only extreme point of interest is _Rc = 0. □ 

Since /(-Rc) is in general not known analytically, the function g{Rc) must be numerically 

evaluated using C(SNR). An exception to this is the capacity of the AWGN channel, where 

(7^^(i?c) can be calculated analytically. Moreover, it can be proved that in this case, a minimum 

E^/Nq for nonzero rates does not exist. 



December 9, 2010 



DRAFT 



21 



Corollary 4 (Minimum E^^/Nq for the AWGN channel): The minimum E^/Nq for the AWGN 
channel is unique, and it is obtained for zero-rate transmissions. 
Proof: The derivative of {Rc) in (1381 ) is given by 

» (^=>= 2Ri 

To prove that a zero for a nonzero rate does not exit, we need to prove that fi'nX(-^c) > for 
Rc > 0, since g^^iRc) > for Rc > 0. This follows because limj^.^_^o+ Qn^iRc) = and the 
first derivative of fi'n^(-Rc) is strictly positive: 

CtrCc iV 

□ 

In Fig. [2] (c), we present the function g{Rc) in (|42|) for the same constellations presented 
in Fig. [2] (a)-(b). If g{Rc) = has at least one solution for Rc > 0, the capacity curve will 
have a local minimum (shown with a filled square in Fig. [2] (b)-(c) for the BSGC). Note also 
that the BSGC has an interesting property, namely, limR^^o+ 9n^iRc) = —oo, and consequently, 
lim/{^^o+ fE^{Rc) = +00. In this sense, the BSGC is an extremely bad labeling for M-PAM 
input alphabets and asymptotically low rates. 



F. Probabilistic Shaping 

The maximum BICM capacity in (|40l) is an optimization problem for which analytical solutions 
are unknown. In this subsection, we study the solution of (|40|) when the input alphabet and the 
binary labeling are kept constant, i.e., we study the so-called probabilistic shaping. Formally, we 
want to solve P*(SNR) = argmaxp 1^^ (SNR), where Vl = [X, L, P], for a given input alphabet X 
and labeling L. Since this optimization problem turns out to have multiple local minima and no 
analytical methods are known for solving it, we perform a grid search with steps of 0.01 based 
on Gauss-Hermite quadratures. The optimization is performed over the three variables defining 
the input distribution: Pc^iO), Pc^iO), and Pc.^{0). For each SNR value, the input distribution 
that maximizes the BICM capacity is selected. 

In Fig. m we show the BICM capacity for an 8-PAM input alphabet labeled by the BRGC and 
the NBC, when the optimized input distributions are used. We use the notation = [X, L, P*] . 
The results in this figure show how, by properly selecting the input distribution, the BICM 



December 9, 2010 



DRAFT 



22 




capacity can be increased. The gap between the BICM capacity and the AWGN capacity is 
almost completely eliminated for Rc < 2 bit/symbol (in contrast to a gap of approximately 
1 dB in Fig. |2](b)). Similar results have been presented recently in [[64ll for 4-PAM. Interestingly, 
Fig. [3] shows that if the input distribution is optimized, the NBC is not the optimal binary labeling 
for low SNR anymore, but the BRGC with an optimized input distribution achieves the SL. This 
is also the case for the FBC, but we do not show those results not to overcrowd the figure. 

IV. BICM FOR Asymptotically Low Rates 

In this subsection, we are interested in finding an asymptotic expansion for the CM and the 
BICM capacities when SNR 0. 

A. Relation between AWGN and BICM capacity 

We start by proving that the BICM capacity can be optimal in the sense of being equal to the 
AWGN capacity only for zero rate. This very simple result motivates the developments in order 
to characterize the behavior of BICM for asymptotically low SNR. 

Theorem 5: The AWGN capacity, the CM capacity, and the BICM capacity are related through 
the following two inequalities. 
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i) 1^^ (SNR) < C^w (S|\|R) ^itj^ equality if and only if SNR = 0, and 

ii) IH^(SNR)<|CM(SNR). 

Proof: We start by proving that 1^^ (SNR) < C^w (s|\|r) f^^. si\|r > q, where 1^^ (SNR) 
is the CM capacity of the AWGN channel. From ^ and Ix{X]Y) = h{Y) - h{Z), we 
express the CM capacity for a given Q in terms of differential entropies as l^^(SNR) = 
h(Y) — log2(27rA'^oe). Since the differential entropy h{Y) = — py (y) log2 pv iu) dy 



is maximized if and only if Y is Gaussian distributed Il44l Theorem 8.6.5], the use of any 
constellation (discrete input alphabet) will give a smaller differential entropy h(Y) than for 
a Gaussian Y, which proves that 1^^ (SNR) < C^^ (snr) for SNR > 0. 

We now prove that Ig^ (SNR) < 1^^ (S|\|R) for any fading channel and SNR > 0. To do this, 
we note that the CM capacity for fading channels is equal to the CM capacity for the AWGN 
channel averaged over the distribution of the instantaneous SNR. Furthermore, (SNR) is a 
strictly concave function of SNR for SNR > 0, because the second derivative of the AMI as a 
function of the SNR (the first derivative of the MMSE, see footnote 8) is strictly negative for 
SNR > ll65l Proposition 5] ll66l Proposition 7]. Therefore, Jensen's inequality holds, which 



yields IH^(SNR) = F.h[\^'^ (H^EjNo)] < \^'^ {F.h[H^]EJNo) = I^W(SNR) for SNR > 0. 



This and the fact that 1^^^ (0) = (0) = proves item i). The proof of item ii) was presented 



Corollary 6: The BICM capacity and the maximum BICM capacity can be equal to the 
AWGN capacity only for zero rates, i.e., 1^^ (SNR) = C^^ (SNR) = C^^ (snr) only if SNR = 0. 

Proof: From Theorem [51 we know that for any SNR > 0, the inequality l[^^(SNR) < 
l^^(SNR) < C^W(S|\|R) holds. Therefore, for any SNR > 0, 1^ (SNR) < C^^ (s^r). The 
proof for the BICM capacity is completed noting that (0) = C"^^ (0) = 0. The proof for the 
maximum BICM capacity follows from the fact that Theorem [5] holds also when an optimization 
over r2 is applied. □ 

Corollary [6] simply states that the only rates for which the AWGN will be equal to the BICM 
capacity and the maximum BICM capacity is -Rc = (or equivalently SNR = 0). In the following 
subsections, we analyze the asymptotic behavior of the BICM capacity when SNR = 0. 




in [HI Sec. III]. 



□ 
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B. A Linear Approximation of the Capacity and the SL 

Any capacity function C(SNR) can be approximated using a Taylor expansion around SNR = 
as C(SNR) = aSNR + O(SNR^). By inversion of power series £671 Sec. 1.3.4.5], we find 

C~\R,) = -R, + 0{Rl), 
a 

and using (l37l) . it is possible to obtain a linear approximation of the function f{Rc) 

f{R,) = - + 0{R,). (44) 

a 



For asymptotically low rates, (1441) results in 

\im f (Re) = (45) 
and since from (|39l) l/a > \og^{2), we obtain 

a<log2e. (46) 



It is clear from (|46 



the SL -1.59 dl 



Jio 



that a capacity function C(SNR) that has a coefficient a = log2e achieves 
Moreover, based on the results for the BSGC in Fig. |2] (b), the coefficient 



a can be as low as zero. 

C. First-Order Asymptotics of the BICM Capacity 

Theorem 7 (Linear approximation of the AMI): When the channel is perfectly known at the 
receiver, and for any input distribution Px{x), the AMI between X and 1^ in ([3]) can be 
expressed as 

Ix{X- Y) = aSm + 0(SNR2) 

when SNR — )■ 0, where 

a = log,eil- " ""l ^" ). (47) 



Proof: The proof is given in Appendix |Al □ 
Theorem |7] shows how to calculate the first-order asymptotics of an AMI with arbitrary input 
distribution. The following corollary follows directly from the definition of the CM capacity in 
(l25l) . where the input distribution is given by (l30l) . 

'"Or equivalently, if we measure the AMI in nats, a = log^, e = 1. 
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Corollary 8 ( Coefficient a'^): The CM capacity can be expressed as 

1^^ (SNR) = aH^SNR + ©(SNR^) 



Bi _ log2 e 



BI 



when SNR 0, where a'^^ is given by (l47l) . 

The next theorem gives the first-order asymptotics for the BICM capacity. 

Theorem 9 (Coefficient a^^): The coefficient a for the BICM capacity l[^^(SNR) is given by 

m—l 

^ EE Pc,in)\\Ex\c,=u[X]r - m\\Ex[X]r , (48) 

k=0 «g{0,l} 

Proof: Reordering the result of Theorem |2l we have that 

m— 1 ^ 

iHHsnr) = EUx(X;1^)- E Pc,{u)Ixic,=u{X;Y) 

k=0 ^ mG{0,1} 

Since IxiX; Y) and Ix\Ck=uiX; Y) are AMIs, we can apply (|47l) to each of them, which gives 

^^j^{Es-\\^x[x]r- E PcA^){^xic,=u[\\xr]-\\Exic,=u[x]r)\. 

k=0 ^ ne{0,l} ^ 

We recognize Yjug{oi} Pcki'^)'^x\Ck=u[\\X\\'^] as the average symbol energy E^, which com- 
pletes the proof. □ 

The first-order coefficients of the expansion of the CM and BICM capacities in Corollary [8] and 
Theorem |9] do not depend on the fading. This simply states that, under the constraints imposed 
on H, the fading has no effect on the first-order behavior of the BICM capacity. Consequently, 
the analysis of the optimal constellations for fading channels at low SNR can be reduced, without 
loss of generality, to the AWGN case. 

Corollary [8] and Theorem [9] generalize the results in [[2T|. [|22l| by considering constellations 
with nonuniform input distributions and arbitrary dimensions, mean, and variance. This gener- 
alization will allow us to analyze optimal constellations f2 in the next section. 

In general, we know from (|46|) that a^^ < log2 e, which can be interpreted as the penalty 
of a certain BICM system over an optimal CM system (without interleaving). In the following 
section we analyze for R\M and PSK input alphabets with different binary labelings and 
P = Um and we also show how to obtain a^^ = log2 e for general constellations. 



December 9, 2010 



DRAFT 



26 



V. First-Order Optimal Constellations for BICM 



Shannon stated in 1959, "There is a curious and provocative duality between the properties of 
a source with a distortion measure and those of a channel" [68|. Many instances of this duality 
have been observed during the last 50 years of communications research. A good summary of 
this is presented in ll45l Sec. V]. The coefficient a is mathematically similar to the so-called 
linearity index HOll . which was used to indicate the approximative performance of labelings in 
a source coding application at high SNR. The usage of the HT in this section was inspired by 
the analysis in [|40ll . 



A. ¥00 Constellations 



n 



In view of the SL P^ . we define ^first-order optimal {¥00) constellation for BICMi] as a 



constellation ^ that results in a coefficient = logg e. 

„BI 



Theorem 10 (Coefficient for arbitrary constellations): For any constellation Q 



a 



BI 



log2e 
2E. 



m— 1 



XiPx{Xi) 



^/Pcjj\k) 



2 Ex\X] 



(49) 



where ^ are the elements of the modified labeling matrix in 

Proof: The proof is given in Appendix |Bl □ 

Theorem [TOl is a very general theorem valid for any constellation f2. From this theorem, it is 
clear that the problem of designing FOO constellations for BICM has three degrees of freedom: 
the input alphabet X, the binary labeling L, and the input distribution P. 

From now on, we restrict our attention to uniform input distributions P. This restriction can 
be justified from the fact that due to the digital implementation of the transceivers, changing 
the input alphabet or the binary labeling can be implemented without complexity increase. On 
the other hand, implementation of probabilistic shaping requires a modification of the channel 
encoder and/or the interleaver. If P = Vm, then Pck{u) = 1/2 for /c = 0, 1, . . . , m — 1 and 
n G {0, 1}, and (|49|) simplifies into 



, BI 



, m — 1 

log2 e 



fe=0 



1=0 



(50) 



"a similar first-order optimality criterion for thie CM capacity can be defined. In tiiis case, based on J47t . any constellation 
based on a zero-mean input alphabet is an FOO constellation for the CM capacity, regardless of the input distribution Px(x). 
Conversely, no FOO constellation can have nonzero mean. 
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Keeping X fixed and clianging tlie labeling L is equivalent to fixing L and reordering the 
rows of X. Therefore, a joint optimization of = [X, L, Um] over X and L can without loss 
of generality be reduced to an optimization over X only, for an arbitrary L. In the following 
analysis, we will hence sometimes fix the labeling to be the NBC, without loss of generality. 

The expression for a^^ in (l50l) can be simplified further using the HT, as elaborated in the 
next theorem. 

Theorem 11 (The HT and a^^): The coefficient a^^ for a constellation Q = [X, Nm,Uj\/] is 
given by 

, m— 1 

k=0 

where are elements of the HT of X defined by (fT4l) . 
Proof: Using Lemma [T] and (fT5l) in (|50l ), we obtain 

AI-l 

M 



, m—l 

^Bi log2 e 



k=0 



j=0 



2 m-1 



logae^ 2 

E, ^ " 2 II 



□ 



«H'<l^El|4.ll= = ^ElKIP = log.e (51) 



It follows from Theorem [TT] and (fT6l) that 

^ ~ ME, 

j=0 ^ i=0 

for any constellation, which is in perfect agreement with (l46l) . We now proceed to determine 
the class of input alphabets and labelings for which the bound (|5T1) is tight. 

Theorem 12 (Linear projection of a hypercube): A constellation Q = [X, L, Um] is FOO if 
and only if there exists an m x matrix V = [vq, . . . , such that 

X = Q(L)V. (52) 



Proof: Consider first the NBC. Equality holds in (1511) if and only if xj = for all j 
0, . . . , M - 1 except j = 1, 2, 4, . . . , 2'""^ For such input alphabets, ([HI) yields 



m— 1 

Xi = ^ ^ hi 2^X2k. 

k=0 

Letting Vk = for /i; = 0, . . . , m — 1 and using (fT3l) . we obtain 

m— 1 

^qi,kVk, i = 0, ...,M-1. (53) 



m— 1 
fc=0 



December 9, 2010 



DRAFT 



28 



Oil 001 




110 100 010 " - " ' oil 

(a) OTTO constellation (b) OTOTO constellation 



Figure 4. The two FOO constellations defined in Example |4](m — 3 and A*' — 2). Graphically, the OTTO constellation in (a) 
gives the impression of a projected cube. The OTOTO constellation in (b) gives the impression of a 6-PSK input alphabet with 
two extra points located at the origin. 



Letting V = [vq, . . . , completes the proof for L = Nm- That the theorem also holds for 

an arbitrary labeling follows by synchronously reordering the rows of X and L, as explained 
before Theorem \TT\ □ 

Theorem [l2l has an appealing geometrical interpretation. Writing the set of constellation points 
as in (|52l) . each row of Q can be interpreted as a vertex of an m-dimensional hypercube, and 
V as an m X projection matrix. Hence, a constellation for BICM is FOO if and only if its 
constellation is a linear projection of a zero-mean hypercube. This interpretation, as well as all 
theorems presented so far, holds for an arbitrary dimension A^. In the rest of this section, we 
will exemplify the results for = 1 and 2, because such input alphabets are easily visualized 
(Figs. H-© and often used in practice (PAM, QAM, and PSK). 

Example 4 (OTTO and OTOTO constellations): To exemplify the concept of Theorem [T2l we 
present two constellations that are FOO. The projection matrices for the "one-three-three-one" 
(OTTO) and the "one-two-one-two-one" (OTOTO) constellations are defined as 



VoTOTO 



V, 



OTTO 



-1 -1 
+ 1 
-1 +1 



cos(7r/3) sin(7r/3) 
cos(7r/3) — sin(7r/3) 



Both constellations are shown in Fig. ID The figure illustrates that the minimum Euclidean dis- 
tance, which is an important figure-of-merit at high SNR, plays no role at all when constellations 
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000 001 010 oil 100 101 110 111 



Figure 5. Hierarchical 8-PAM constellation. The constellation is FOO and V — [—do, — di, • • ■ , — dm-i]^- 



are optimized for low SNR. 

A particular case of Theorem [T2l are the nonequally spaced (NES) Af-PAM input alphabets, 
as specified in the following corollary. 

Corollary 13: If a NES Af-PAM input alphabet X consists of the points ±Vo±Vi±- ■ -itv-i, 
there exists a binary labeling L such that the constellation [X, L, Um] is FOO. 

Example 5 (Hierarchical constellations): The so-called "hierarchical constellations" 11691 - 11711 
are defined by the one-dimensional input alphabet ll69l eq. (3)] 

m— 1 

X, = ^(26fc(^)-l)4, (54) 

fc=0 

where hk{i) is the base-2 representation of the integer i with i = 0, ... ,M — 1, and where 
dk > for k = 0, . . . , m — 1 are the distances defining the input alphabet. The additional 
condition xi < Xj+i for i = 0, . . . , Af — 2 is usually imposed so that overlapping points in 
the input alphabet are avoided. This condition also keeps the labeling of the input alphabet 
unchanged. 

In Fig. [51 we show a hierarchical 8-PAM input alphabet. In this figure, the Af constellation 
points are shown with black circles, while the white squares/triangles represent 2- and 4- 
PAM input alphabets from which the 8-PAM input alphabet can be recursively (hierarchically) 
constructed. 

The binary labeling used in hierarchical constellations is usually assumed to be the BRGC. 
In this case, we find that when X is given by (l54l) . the system in (|52|) has no solutions for 
V, and therefore, the constellation is not FOO. However, if the NBC is used instead (as in 
Fig. [5]), all hierarchical constellations are FOO, because X = Q(Nm) V gives a projection matrix 

V = [-do, -di, -d,n-i]'^- 
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B. Labelings for PAM, QAM, and PSK 

While we have so far kept the labeling fixed and searched for good input alphabets, we now 
take the opposite approach and search for good labelings for a given input alphabet. In this 
section we analyze the practically relevant input alphabets PAM, QAM, and PSK defined in 
Sec. III-B[ Throughout this section, we assume P = Um- 

Example 6 (NBC for M-PAM): Let V = [vo, vi, . . . , Vm-if = [-1, -2, -4, . . . , -2"^"^]^ 
and let L = N„i. With ^ given by (fT3l) . we obtain from (|52|) the constellation Xpam — 
[-M + 1, -M + 3, . . . , M - 1]^, which shows that the constellation [Xpam, N™, Vm] is FOO. 
In view of Theorem [HI the optimality of M -PAM input alphabets comes from the fact that the 
HT of XpAM has its only nonzero elements in the m positions 1, 2, 4, ... , 2™"^. 

It follows from Example [6] that the constellation [Xpam, N^, Ua/] is FOO, which has also 
been shown in [|23l . The following theorem states that the NBC is the unique labeling with this 
property, apart from trivial bit operations that do not alter the characteristics of the labeling. 

Theorem 14: The constellation [Xpam, L, Ujv/] is FOO if and only if L = Nm, or any other 
binary labeling that can be derived from the NBC by inverting the bits in certain positions or 
by permuting the sequence of bits in all codewords. 

Proof: The proof is given in Appendix O □ 

In order to extend this result to rectangular QAM constellations, we first state a theorem about 
product constellations in general. 

Theorem 15: A two-dimensional constellation [X, L, Uj\/], where X = X' ® X" is the ordered 
direct product of two one-dimensional input alphabets X' and X" and all symbols Xi are distinct, 
is FOO if and only if both the following items hold. 

. There exist labelings L' and L" such that [X',L',Ua/'] and [X", L", U^///] are both FOO 
(where M' and M" are the sizes of X' and X", resp.). 

• L = nc(IL' (g) L"), where He is an arbitrary column permutation. 

Proof: The proof is given in Appendix |Dl □ 

As a special case, the theorem applies to rectangular QAM constellations since they are defined 
as the ordered direct product of two PAM input alphabets. In view of Theorem O and since 
Nm' ® Nm" = Nm'+m", the foUowiug corollary gives necessary and sufficient conditions for a 
rectangular (M' x M")-QAM constellation to be FOO. 
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Corollary 16: A constellation [Xqam, L, Ua/], where Xqam is an (M' x M")-QAM input 
alphabet and M = M'M" = 2™, is FOO if and only if L = N^, or any other binary labeling 
that can be derived from by inverting the bits in certain positions or by permuting the 
sequence of bits in all codewords. 

Can a constellation based on an M-PSK input alphabet be FOO with a suitably chosen 
labeling? What about constant-energy constellations in higher dimensions? A complete answer to 
these questions is given by the following theorem. An intuitive interpretation is that a constellation 
based on a constant-energy input alphabet is FOO if and only if it forms the vertices of an 
orthogonal parallelotope, or "hyperrectangle." 

Theorem 17: A constellation [X, L, Um], where ||£Cjp is constant for alH = 0, . . . , M — 1, is 
FOO if and only if X can be written in the form (|52|) with orthogonal vectors vq, . . . , f m-i- 
Proof: The proof is given in Appendix El □ 

The case of PSK input alphabets follows straightforwardly as a special case of Theorem \T7\ 
Indeed, the fact that a set of m orthogonal vectors cannot exist in fewer than m dimensions 
leads to the following conceptually simple corollaries. 

Corollary 18: FOO constellations based on constant-energy input alphabets in N dimensions 
cannot have more than 2^ points. 

Corollary 19: No FOO constellations based on M-PSK input alphabets exist for M > 4. 

Observe that the criterion in Theorem \T7\ is that vq, . . . , Vm-i should be orthogonal, not 
necessarily orthonormal. Thus, FOO constellations based on constant-energy input alphabets are 
not necessarily hypercubes. In particular, a 4-PSK input alphabet does not have to be equally 
spaced to give an FOO constellation. Indeed, any rotationally symmetric but nonequally spaced 
4-PSK input alphabet (i.e., a rectangular one) gives an FOO constellation. 

C. M-PAM and M-PSK Input Alphabets 

In this subsection, we particularize the results in Sec. IIV-BI to practically relevant BICM 
schemes, i.e., M-PAM and Af-PSK input alphabets with uniform input distributions using the 
four binary labeling s defined in Sec. III-BI 

Theorem 20 (Coefficient afl for Vt = [Xpam, Lm, Ua/])-' For Af-PAM input alphabets using 
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Um, the coefficient a^^ for the binary labelings defined in Sec III-BI is given by 



4(M2-l)^°S2e, ifU 

log2 e, if L„ 

0, if h„ 
Proof: The proof is given in Appendix IB 



or L„ 



F 



(55) 



□ 



Theorem 21 ( Coefficient a^^ for H = [Xpgx, ^m])' For M-PSK input alphabets using Vm, 
the coefficient al^^ for the binary labelings defined in Sec III-BI is given by 



. BI 



81o^ 






M2 sin^ 
4 log 


(tt/M)' 
52e 




M2 sin^ 
< 4 log 


(tt/M)' 
52e 


)+(l-sect)^^ 

m 

l + ^tan2(7r/2'= 

fc=2 


M2 sin^ 
4 log 


(vr/M) 
^2e 


M2 sin^ 

v 


(tt/M) 



if L„ 
if L„ 
if L„ 

if L„ 



(56) 



F 



where sec x = 1/ cos x is the secant function. 

Proof: The proof is given in Appendix O □ 

In Fig. [6l we present the pmf of the coefficient a^^ obtained via an exhaustive enumeration 
of the 8! = 40320 different binary labelings (without discarding trivial operations) for 8-PAM 
and 8-PSK with Us. For 8-PAM, Fig. [6] (a) shows that many binary labelings are better than the 
BRGC at low SNR, the best one being the NBC as found in ll23l . On the other extreme we find 
the BSGC, which gives a coefficient equal to zero, reflecting the inferior performance in Fig. |2] 
(b). Based on (l45l) . we obtain that the E\,/Nq for reliable transmission at asymptotically low 
rates in this case is oo, and it is independent of M. We find that among the 8! possible binary 
labelings, there exist 72 classes of binary labelings that have a different o^, and therefore, a 
different first-order asymptotic behavior. We also note that the BICM capacity for the BRGC 
and the FBC in Fig. [2] (b) are different for SNR > 0. However, their coefficient in (1551) is 
the same, and thus, the curves for these labelings in Fig. [2](b) merge at low rates. 

Fig. [6] (b) shows that for 8-PSK, there exist only 26 classes of binary labelings with different 
coefficients o^. In particular, the NBC and the BSGC result in a moderate coefficient, and the 
BRGC in a quite high coefficient. We found that the FBC is the asymptotically optimal binary 
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Figure 6. The pmf of for 8-PAM (a) and 8-PSK (b) with Us. The four labelings defined in Sec. HTBI are shown with 
white markers. 



labeling for 8-PSK, unique up to trivial operations, and we conjecture it to be optimal for any 
M-PSK input alphabet and m > 2. Interestingly, there are no binary labelings for 8-PSK that 
give a coefficient zero or one, and the number of distinct pmf values is only ten (25 for 8-PAM). 

From (l45l) we know that a^^ determines the behavior of the function /q^-Rc) for asymptotically 
low rates. Following the idea introduced in [[2T|. we analyze how the values of for PAM 
and PSK input alphabets behave when M — t- oo. A summary of the values of lim a^^ and 

M^oo 

lim fE^(Rc) for M-PAM and Af-PSK input alphabets using Uj\/ are presented in Table l for 

Rc->o+ PI 

the four labelings previously analyzeco. For most of the constellations, there is a bounded loss 
with respect to the SL when M ^ oo. For the BRGC, this difference is 1.25 dB for Af-PAM 
and 0.91 dB for A/-PSK. On the other hand, for the NBC and A/-PAM, the difference is zero 
for any M. Note that all the coefficients a^^ in (|55l) and in (|56l) are nonincreasing functions of 
M. 

'The limits Km for A/-PSK are obtained based on limA/^oo ^-4 — ttt — ~7 (obtained by L'Hopital's rule). For the 
NBC, we obtain numerically that Efcl2 t^-n^ ('^/2'') ~ 1.2240, which gives the coefficient 8.89 in Table H] 
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Table I 

First-order asymptotics of A/-PAM and A/-PSK input alphabets using Ua/ for different binary labelings. 







PAM 




PSK 




1- BI 

lim an 

A/— f oo 


lim fE\Rc) 


lim 

A/— )-oo 


lim fE\R< 




1 log2 e 


-0.34 dB 


log2 e 


-0.68 dB 




logje 


-1.59 dB 


log2 e 


2.33 dB 







oo 


log2 e 


2.33 dB 


F 


f log2 e 


-0.34 dB 


^log2e 


-1.14 dB 



VI. Numerical Examples 

A. Turbo-coded System Simulation 

In order to validate the analysis presented in the previous sections, we are interested in 
corroborating if the use of the NBC instead of the BRGC for PAM input alphabets actually 
translates into a real gain when capacity-approaching codes are used. To this end, we simulate a 
BICM scheme which combines a very low rate capacity-approaching code with Af-PAM input 
alphabets. We use Divsalar's rate-1/15 turbo code, formed by a parallel concatenation of two 
identical 16-state rate- 1/8 recursive systematic convolutional (RSC) convolutional codes defined 
by their polynomial generators (1,21/23,25/23,27/23,31/23,33/23,35/23,37/23)8 [|72l. The 
two RSCs are separated by a randomly generated interleaver of length = 16384, and 64 tail bits 
are added to terminate the trellis, giving an effective code rate of i? = 16384/ (15 ■ 16384 + 64). 
We combine this turbo code (via a randomly generated interleaver) with 4-PAM and 8-PAM 
using NBC or BRGC, yielding Rc ~ 0.13 bit/symbol and Rc ~ 0.2 bit/symbol respectively. 
The constellation symbols are equally likely, the decoder is based on the Log-MAP algorithm, 
and it performs 12 turbo iterations. In Fig. |7l the bit error rate (BER) performance of such a 
system is presented. 

We study the E\,/Nq needed for the four different constellations to reach a BER target BER = 
10"^ For 4-PAM, the values for the BRGC and the NBC are, respectively, E^,/Nq = 0.99 dB and 
Ek/No = 0.29 dB, i.e., the NBC offers a gain of 0.4 dB compared to the BRGC. For 8-PAM, 
the obtained values are Ey^/Nq = 1.05 dB and E^^/Nq = 0.45 dB, which again demonstrate 
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m 

CQ 



° □ o 



□ N2 (4-PAM) 
- n - G2 (4-PAM) 
A Q. □ . . .A . N3 (8-PAM) 

-^•^ ; - A - G3 (8-PAM) 



0.2 0.4 0.6 0.8 

E^/Nq [dB] 



1.2 1.4 



Figure 7. BER for the rate-1/15 turbo code with 4-PAM and 8-PAM for the BRGC and the NBC (Rc ^ 0.13 bit/symbol and 
Rc ~ 0.2 bit/symbol respectively). The metrics' computation is based on the interleaver size is A*" = 16384, the decoder 
is based on the Log-MAP algorithm, and it performs 12 turbo iterations. The filled circles represent the E^^/No needed for the 
configuration to reach a BER = 10^", which are also shown for 8-PAM in Fig.|2](b). 



the suboptimality of the BRGC in the low SNR regime. Moreover, we also simulated an 8- 
VhM input alphabet labeled by the BSGC. We obtained in this case E\,/Nq = 8.40 dB, i.e., a 
degradation of 7.95 dB is caused by a bad selection of the binary labeling. The values of -Eb/^o 
obtained for these last three cases are shown in Fig. [21 (b). These results show that the turbo- 
coded system performs within 1 dB of capacity, and that the losses of 0.6 dB and 7.95 dB 
can be observed from the capacity curves as well. This indicates that the results obtained from 
Fig. [2l for different labelings can be used as an a priori estimate of the system performance 
when capacity-approaching codes are used. 

B. Capacity vs. E^^/Nq 

In Fig.[8](a), we show the function /^^(i?c) and fE^{Rc), defined in Sec. 111^01 using 4-PAM 
and 8-R\M input alphabets. We also show /^H-^c) for 4-R^.M and 8-R\M input alphabets for 
different binary labelings and for hierarchical 4-R\M and 8-R^.M constellations (Example [5]). The 
curves in Fig. [8] (a) intersect the horizontal axis at E^/Nq = l/an, where = 1/ log2 e = 
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Figure 8. (a) AWGN capacity, CM capacity, and BICM capacities for A/-PAM with tlie BRGC, NBC and FBC. Tlie BICM 
capacity for iiierarchical 4-PAM witii V — [—1, —5]^ and 8-PAM witfi V — [—1, —2, —6]^ is also shown. The white circles 
give the performance at Rc = 0, where Qq' determines the BICM capacity. The BRGC and FBC are equivalent for A/ — 4. 
(b) SNR gap An(i?c) in JSTt for the same capacities and for 16-R\M. 



— 1.59 dB represents the SL. From this figure, we observe that for CM both constellations are 
FOO, while for BICM only four of them are FOO, the ones labeled by the NBC. 

In Fig. |9] (a), similar results for 8-PSK are shown. We also include the results for the OTTO 
and OTOTO constellations in Fig. |4] (Example |4]). From this figure, we observe that for the CM 
capacity the 8-PSK input alphabet gives an FOO constellation, and for BICM, the OTTO and 
the OTOTO constellations are FOO. Moreover, for high SNR, the OTTO constellation results in 
a higher capacity than the constellations based on 8-PSK input alphabets. 

C. The SNR Gap 

Borrowing the idea from [[26|, we define the SNR gap as the horizontal difference^ between 
the CM and BICM capacity and the capacity of the AWGN channel for a given Rc, i.e., 

(^c)--p^, A^{Rc)-jj^^. (57) 

'^The gap is the same regardless of whether the horizontal axis represents E-b/No or SNR. 
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Figure 9. AWGN capacity, CM capacity, and BICM capacities for 8-PSK with the BRGC, NBC, FBC and BSGC. The BICM 
capacity for the OTTO and OTOTO constellations are also shown. The white circles give the performance at Rc = 0, where 
oq^ determines the BICM capacity, (b) SNR gap An{Rc) for the same capacities. 

These expressions, which represent the additional energy needed for a given constellation to 
achieve the same -Rc as the optimal scheme (the AWGN capacity), are evaluated numerically 
in Figs. [8] (b) and [9] (b). In Table UIl we present a summary of the SNR gap at asymptotically 
low rates for different constellations. This asymptotic SNR gap is given by logje/aQ^ and is a 
scaled special case of the results presented in Sec. IV-CI 

VII. Conclusions 

In this paper, we introduced a general model for BICM which considers arbitrary input 
alphabets, input distributions, and binary labelings, and we analyzed different aspects of the 
BICM capacity. Probabilistic shaping for BICM was analyzed and the relation between the 
BICM capacity and Ei,/No was studied. Four binary labelings (BRGC, NBC, BSGC, and FBC) 
were analyzed in detail, and for 8-R^.M with uniform input distribution, the results showed that 
the BICM capacity is maximized by the NBC and the FBC for 36% of the rates. 

First-order asymptotic of the BICM capacity for arbitrary constellations were presented, which 
allowed us to analyze the behavior of the BICM capacity for low rates. The E]^/Nq required 
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Table II 

The SNR gap at asymptotically low rates for BICM and different constellations. 



Constellation log^e/a^ [dB] 



4-PAM 


BRGC/FBC 


0.96 




NBC 





4-PAM Hierarchical 







BRGC 


0.69 


8-PSK 


NBC 


3.69 




FBC 


0.32 




BSGC 


3.01 


OTTO 





OTOTO 





8-PAM Hierarchical 





8-PAM 


BRGC/FBC 


1.18 




NBC 







BSGC 


oo 


16-PAM 


BRGC/FBC 


1.23 




NBC 







BSGC 


oo 



for reliable transmission at asymptotically low rates was found to take values between the SL 
— 1.59 dB and infinity. The asymptotic analysis was used to compare binary labelings for PAM 
and PSK input alphabets, as well as to predict the actual system performance at low rates when 
capacity-approaching codes are used. The asymptotically best labelings for M-PAM and Af-PSK 
with uniform input distributions appear to be the NBC and FBC, respectively. 

Using the first-order asymptotic of the BICM capacity, we analyzed the problem of FOO 
constellations for BICM. We showed that, under some mild conditions, the fading does not change 
the analysis of FOO constellations made for the AWGN channel. Interpreting the codewords of 
a binary labeling as the vertices of a hypercube, a constellation for BICM with uniform input 
distributions is FOO if and only if the input alphabet forms a linear projection of this hypercube. 
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Important special cases of this result are that constellations based on equally spaced Af-PAM 
and M -QAM input alphabets are FOO if and only if the NBC is used. Another particular case 
are the hierarchical (nonequally spaced) Af-PAM input alphabets labeled by the NBC. We also 
showed that constellations based on constant-energy M -PSK input alphabets can never be FOO 
if M > 4, regardless of the binary labeling. 

In this paper, we focused on asymptotically low rates, and we answered the question about 
FOO constellations for this case. The analysis of second-order optimal constellations for BICM, 
and the dual problem for asymptotically high rates, or more generally, for any rate, is still an 
open research problem. 

Appendix A 
Proof of Theorem |7] 

In [|73l Theorem 3], the model Y = HX + Z is considered, where i3" is a matrix. This 
theorem states that the AMI between X and Y when H is known at the receiver can be 
expressed as 

Ix{X-Y) = ^trace (e^[^ cov (X) ^^]) +0(iV-2) (58) 

when Nq — )■ oo, if the two following conditions are fulfilled: 

• There exist finite constants c > and d > such that Ex[||X||''"'"'^] < c. 

• There exists a constant v > such that the matrix H satisfies Pr{||-H"|| > 5} < exp(— 5^^) 
for all sufficiently large 5 > 0. 

Since we consider real-valued vectors only, we have replaced the Hermitian conjugates in [[731 
by transpositions in (|58l) . Moreover, [|73l Theorem 3] requires Z, X, and HX to be "proper 
complex". Nevertheless, the results are still valid if the two conditions in the items above are 
fulfilled, as explained in 11731 Remark 6]. 

The first condition is fulfilled since a^o, • • • , xm~i are all finite, and therefore, Ex[||X'||'^] < oo 
for all d > 0. The second condition is fulfilled because H = diag (H) and because of the 
condition ([5]) imposed on H. Moreover, since H = diag (H) and H contains i.i.d. elements, 
E^[^cov(X)^^] = Eh[H^] cov(X). The use ofthe identity trace (cov(X)) = Ex[||X||2]- 
||Ex[X]f , the definition of E^, and the relation SNR = Eh[H^]-^ in ([58]), gives (|47|). 
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Appendix B 
Proof of Theorem \T0\ 

Expanding the inner sum in (l48l) . we obtain 

5^ PcMmx\c,=u[X]f = PcM 5^ x.Px|c..=o(a;. 

«e{o,i} ieik.o 



^ X^Px\Ck=l{^^ 



Using the identity ||a|p + = ^Ha — + i||a + 6|p and (|3T1) . we obtain 



Pc,{u)\\Ex\c,=u[X] 

ue{o,i} 



+ 



PC.(O) J2 X^PxlC,=0ix,) + J Pc,il) J2 ^^PX\CM^^) 



; XiPxiXi 



— , \^ XiPxiXi] 



1 

+ 2 



In this expression, and based on the definition of qi^k in ©, we recognize the first term as the 
first term inside the outer sum in (|49l) . and the second term as the second term inside the outer 
sum in (|49l) . This used in (l48l) completes the proof. 



Appendix C 
Proof of Theorem [H] 

Consider any FOO constellation [Xpam, L, Ua/], where the binary labeling is defined by qi^k 
for /c = 0, . . . , m — 1 and i = 0, . . . , M — 1. From Theorem [121 there exist real values for 



/c = 0, . . . , m — 1 such that 



m— 1 

Xi = Y '^^''''"'^ 
k=0 



(59) 



for i = 0, . . . , M — 1. We wish to find all combinations of ^ and that satisfy (|59l) . 

We start by giving two properties of the column vector V = [vq, f i, . . . , Vm-i]^ that will be 
used later in the proof. 
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• Since all pairwise differences Xj — Xj = J2T=oi%k ~ Qj,k)vk are even numbers, and since 
qi,k - qj,k G {0, ±2}, we conclude that V G Z™. 

• Because of (|59l ). the sum ±vo±vi± - ■ -itv-i, with all combinations of signs, generates all 
the elements in Xpam- Since Xpam is formed by M distinct elements, ±vq±vi ± ■ ■ -zLvm-i 
must yield M different values, and therefore, for k = 0, . . . ,m — l must all be distinct. 

Consider a given bit position / G {0, . . . , m — 1} and define, for ? = 0, . . . , M — 1, 

Si = Xi mod 2vi 

m—1 

= %kVk ± vi mod 2vi 

k^l 
m—1 

= ^ %kVk + vi mod 2vi, (60) 

fc=0 

where in the last step we have used the identity (a ± h) mod 2h = (a + 6) mod 26. Because 
is an odd integer, Sj G {1, 3, . . . , 21;^ — 1} for all i. We will now study the vector S = 
[so, Si, . . . , and in particular count how many times each odd integer occurs in this 

vector. We will do this in two ways, in order to determine which values vi can take on. 

• It follows from (l60l) that Sj is independent of qi^i for all i. Thus, if two codewords Cj and 
Cj differ only in bit /, then Sj = Sj. This proves that each value 1, 3, . . . , 2w; — 1 occurs an 
even number of times in §. 

• Because X is a vector of odd integers in increasing order and § consists of the same 
elements counted modulo 2vu S consists of identical segments [1,3,..., 2vi — 1]^ of length 
vi. If vi divides M, then § contains a whole number of such segments and each value in 
{1, 3, . . . , 2t>i — 1} occurs exactly M/vi times in §. If on the other hand vi does not divide 
M, then the first and the last segment are truncated. In this case, S includes some values 
[_M /vi\ times and other values \_M /vi\ + 1 times, where [-J denotes the integer part. 

Since either [M/t>J or \_M/vi\ + 1 is odd, and each value must occur in § an even number of 
times, we conclude from these two properties that vi must divide M. Furthermore, the number 
of occurrences M/vi must be even, so vi must divide M/2. 

In conclusion, uq, . . . , Vm-i must all divide M/2, and their absolute values must be all distinct. 
Since M/2 has only m divisors, which are 1,2,4,..., 2"^"^, they must all appear in V, but they 
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can do so in any order and with any sign. If V = [—1, —2, —4, . . . , —2'""^]"'", then (|59l) is fulfilled 
by the NBC (see Example [6l). Negating for any k corresponds to inverting bit k of the 
NBC, whereas reordering the rows of V corresponds to permuting columns in Nm- 

Appendix D 
Proof of Theorem [T5l 

Let M' = 2"*', M" = 2™", and M = 2™ = 2™'+™". To prove the "if" part, we assume that 
there exist two FOO constellations [X', L', Um'] and [X", L", Ujv/"]- Then, from Theorem [T2l 



m"-l 

n 

k=0 



:=E<^^^' ^ = 0,...,M'-l, (61) 



:'=^<X, z = 0,...,M"-l. (62) 



QM"l+j,k 



We analyze the two-dimensional constellation constructed as f2 = [X, L, Um] = [X'®X", L'® 
IL'",Um'm"]- It follows from the definition of the operator (g) that for all / = 0, . . . , M' — 1, 
j = 0, . . . , M" - 1, and A; = 0, . . . , m - 1, 

Q'lk-rn'^ k = m',...,m-l 
We will now show that is FOO by explicitly constructing a matrix V that satisfies Theorem 
[T2l To this end, we define the vectors 

K,0], k = 0,...,m'-l 

1 

[O'^I'-m']' A; = m', . . . ,m - 1 
with v'l^ and v'l that satisfy (|6TI)-(|62|). The vectors Vk constructed in this manner have the property 



Vk 
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that for all / = 0, . . . , M' - 1 and j = 0, . . . , M" - 1, 

m— 1 m'~l m" — 1 

<lM"l+j,kVk = X^ QM"l+j,kVk + X^ (lM"l+j,k+m'Vk+r. 
k=0 k=0 k=0 

m'-l m"-l 



=0 fc=0 
7i'-l »n"-l 



A;=0 fc=0 
= XM"l+j- 

Substituting M"l + j = i yields (1531 ). which shows that Q is FOO. Finally, to show that the 
constellation [X'®X", nc(IL'®IL"), Ua/'a/"] is also FOO, it suffices to observe that synchronously 
permuting the columns of Q(L) and does not change the right-hand side of (|52|) . which 
completes the proof of the "if" pari^. 

For the "only if" part, consider any two-dimensional FOO constellation [X, L, Ua/]. By The- 
orem [121 the elements of X fulfill (|53l) . which can be decomposed into scalar equalities as 

m— 1 

Xi,n = X 9i,kVk,n, 2 = 0, . . . , M - 1, n = 0, 1 (63) 

k=0 

where Xi = [xi^, Xj^i] for i = 0, . . . , M — 1 and Vk = [f fc,o, Vk,i] for /c = 0, . . . , m — 1. We will 
use this decomposition to characterize the points with the largest coordinate value in one of the 
dimensions. Because ^ G { — 1, 1}, a;^ „ takes values in the set ±fo,n ± ■ ■ • ivm-i,n- The largest 
of these values is 

" A _ I I I I I I 

— max Xin — r^'Cn + ' ' ' + \Vm.-l,n\- 
i=0,...,M-l 

If Vk^n 7^ for all k = 0, ... ,m — 1, then the symbol Xi for which = Xn is unique. If 
Vk,n = for one value of k, then Xj „ = x„ for two values of i, and so on. Generalizing, there 
exist 2" symbols for which Xi^n = Xn if and only if there are a zeros among vo^n, • • • , Vm~i,n- 
Analogous relations hold for the minimum of Xi^n- 

For the special case when X is obtained from two one-dimensional input alphabets X' and X" 
as X = X' ® X", the two-dimensional symbols are XM"i+j = [x'l, x"] for / = 0, . . . , M' — 1 and 

'*An intuitive explanation for this is that reordering the bits of all codewords does not change the constellation's performance. 
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j = 0, . . . ,M" - 1. We will prove that there exist labelings L' and L" such that [X',L',Ua//] 
and [X", L", Ua/"] are both FOO, and we will identify the set of all such labelings. We do this 
by analyzing „ for ri = and 1 separately, beginning with n = 0. There are M" symbols 
Xi having Xj o = a^J for each / = 0, . . . , M' — 1. This holds in particular for x'l = xq. From the 
result in the previous paragraph, there are therefore m" zeros among vq^, . . . , fm-i,o- We will 
first consider the special case when the zeros are Vm',o, ■ ■ ■ ^Vm-ifi, i-C, when 

K,0, • • • , Vm'-lfl, Vm'fi, Vm^l,o] = ^0,0, • • • Vm'^lfi, 0, . . . , Oj, (64) 



m nonzero elements zeros 



and will later generalize the obtained results to an arbitrary location of the m zeros. 
Assuming that (|64l) holds, x'l can, for all / = 0, . . . , M' — 1, be written as 



x'l = XM"l+j,0 
m—1 



QM"l+j,kVk,0 



k=0 
m'-l 

= ^ qM"i+j,kVk,o (65) 

fc=0 

where the second line follows from (l63l) and the third from (|64|). The relation holds for all 



j=0,...,M"-l. 

We will now conclude from (l65l) that 



qM"i+j,k = qM"i,k, / = 0,...,M'- 1, J =0,...,M"-1, k = 0, . . . ,m' - 1. (66) 

This can be seen as follows. The sequence qM"i+j,o, • • • , qM"i+j,m'-i can take on 2™' = M' 
different values, because each element is ±1. For given values of v^fl, these sequences all 
yield different values of in (l65l) . because these values are, by assumption, all distinct. Thus 
the sequence qM"i+j,o, • • • , qM"i+j,m'~i is uniquely determined by x'l and wo,o, • • • , Vm'-i,o- Since 
both x'l and v^^ are independent of j, so is qM"i+i,k- Therefore qM"i+j,k = qM"i,k- 
From this conclusion, (|65] ) simplifies into 

m' — 1 

x'i= qM"i,kVk,o, / = 0, . . . , M' - 1, 

k=0 

which is a one-dimensional version of (|53] ). It is satisfied only if [X', L', U^i//] is FOO, where 
the elements ql ^ of Q(L') are 



q'ik = qM"i,k: / = 0,...,M'-1, fc = 0,...,m'-l. (67) 
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A similar analysis for n = 1 shows that [X",L",UAf"] is also FOO and, furthermore, yields 
analogous expressions to (|66l) and (|67l) as 

qM"i+j,k = qj,k, l = 0,...,M' -1, J = 0,...,M" -1, k = m',...,m-l (68) 

q'l, = q,,k+^,, j=0,...,M"-l, k = 0,...,m"-l (69) 

where q" are the elements of Q(L"). 

Together, ([67]), dMl), and ^ show that for / = 0, . . . , M' - 1 and j = 0, . . . , M" - 1, 

k = 0,...,m' -I 

<l'lk-m'^ k = m',...,m-l 
or, equivalently, that Q(L) = Q(L') (g) Q(L"). To convert this relation into a relation between 
(unmodified) labeling matrices L, L', and L", we can apply Q to conclude that L is a column- 
permuted version of L' ® L". 

To complete the proof, we need to consider the case when the m" zeros among vq^q, . . . , fm-i,o 
are not the last m" elements as in (|64|) . To this end, we apply an arbitrary row permutation to the 
V matrix, whose first column is given by (l64l) . Permuting the m rows of V means permuting the 
m elements of (|64l) . which in turn means that the m" zeros are shifted into arbitrary locations. 
Furthermore, as was observed in the first part of this proof, a row permutation of V corresponds 
to a column permutation of Q(L), or, equivalently, a column permutation of L. We can therefore 
conclude that regardless of where the m" zeros are located, the labeling L must be a column- 
permuted version of L' ® L". 

Appendix E 
Proof of Theorem [IT] 

If Vq, . . . ,Vm^i are orthogonal, then VkvJ = for k ^ I. The symbol energies ||£c.j|p, for 
i = 0, . . . , M — 1, can be calculated from (l53l) as 

m—1 m—1 
= ^^(li,kqt,lVkvJ 
k=0 1=0 

m—1 

E2 II l|2 

k=0 
m—1 

k=0 
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which is independent of i. This completes the "if" part of the theorem. 
For the "only if" part, we make use of the identity 

8bc^ = \\a + b + cf - \\a + b - cf - \\a - b + cf + ||a - 6 - cf, (70) 

which holds for any vectors a, b, and c. Let X be any FOO constant-energy input alphabet and 
let k and / be any pair of distinct integers < A;, Z < m — 1. Define 

m— 1 



j=0 
3^,1} 

b = Vk, and c = vi. From (|53l) . the four vectors a ± b ± c all belong to X. Thus, all four have 
the same energy and the right-hand side of (TTOl) is zero. Thus VkvJ = bc^ = 0. This holds for 
all pairs of distinct k and /, which completes the proof. 

Appendix F 
Proof of Theorem [20] 
For P = Ujvf , the average symbol energy is given by Eg = and that the constellation is 

zero mean, i.e., Ex[^]^ = 0. Therefore, the coefficient aj^^ in (l48l) is 

. BI 



if^E E 5E.|c..„m^ (71) 

^ fc=0 nG{0,l} 

For the BRGC, Ex\c^=u[X] = for A; = 1, . . . , m - 1 and m G {0, 1}. For A; = we find that 

Co=u 




which used in (1711) gives the desired result. 
For the NBC, we note that 



Using the fact that 



m-l / T \ 2k 4 / ]^ \ 



2) 3 V"^ M"^) 



Ea 

k 

the result = log2 e is obtainedj^f 

'^A similar argument for the proof of the NBC has been used in (23 
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That a^^ = if = Sm follows trivially because of the construction of the BSGC, 
i.e., Ex\Ck=u[X] = for A; = 0, . . . , m - 1 and u G {0, 1}. 

For the FBC, finally, its symmetry results in the same condition as for the BRGC, i.e., Ex|c7^=u[^] 
for A; = 1, . . . , m — 1 and u E {0, 1}. Moreover, since for A; = the BRGC and the FBC are 
the same, the coefficient a^^ is also the same. 



Appendix G 
Proof of Theorem [21] 

For PSK and any k, PcM^x\c,=o[X] + Pc,imx\c,=i[X] = Ex[X] = 0. Furthermore, 
since PcM = Pc,(l) = 1/2, \\Ex\c,=o[X]\\^ = \\Ex\c,=i[X]f. From these equalities, m 
reduces to 



m—l 



a 



BI 



^^Z2^^\\^X\Cu=o[X] 



k=0 



4 log2 e 
M2 



k=0 



(72) 



A. Proof for the BRGC 

Because of the symmetry of PSK input alphabets and the BRGC, || J2ieiko (1721) is 

zero for k = 2,...,m— 1. Moreover, by symmetry, || XlieXo o ~ II XliGii o ^dl^- Since 
2<),o = {0, . . . , M/2 — 1}, the coefficient in (1721) is given by 



a 



BI _ 8 log2 e 



M2 



M/2~l 
i=0 



8 loga e 
M2 



/M/2~l , s \ 2 /M/2~l , s ^ 2' 



j=0 



1=0 



M 



(73) 



Using BTll eq. (1.341.3)] we find that the first sum in ([73]) is zero, and from JTH eq. (1.341.1)] 
the second sum in (173]) is equal to 1/ sin(7r/M). This completes the first part of the proof. 



B. Proof for the NBC 

For the NBC, || Yln^i^ o **ll^ (172]) is zero for k = 1, . . . ,m — 1. Moreover, since the first 
column of is always equal to the first column of G^, it is clear that the coefficient for the 
NBC is half of the one for the BRGC. 
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C. Proof for the BSGC 

By construction, = Gm for all the columns except the first one, and therefore, only two 
bit positions contribute in the outer sum in (1721) . i.e.. A; = and k = \. From the proof for the 
BRGC, the contribution for A; = 1 is known to be 

1 



sin^(7r/M) 



(74) 



For /c = 0, we need the index set (cf. Example d]) 

Xo,o = {0, 4, . . . , M/2 - 4} U {3, 7, ... , M/2 - 1} 

U {M/2 + 1, M/2 + 5, . . . , M - 3} U {M/2 + 2, M/2 + 6, . . . , M - 2} 



Af/8-l 



y {4fc, M/2 - 1 - 4A;, M/2 + 1 + 4A;, M - 2 - Ak]. 



(75) 



fc=0 



This partitioning of Xq o into four subsets will now be used to calculate 



COS 



(2z + l)7r 
M 



E 



sm ■ 



(2Z+ 1)71 

M 



(76) 



We split the sum over Xo,o in the second term of (1761) into four sums, one for each subset in 
(1751) . which yields 



E 



sm 



(2i+ 1)71 



M/8-1 

E 

k=0 



(1 + Sk)7i (M - 1 - 8A;)7r 

sm h sm — 



M 



M 



(M + 3 + 8A;)7r (2M - 3 - 8A;)7r 

+ sm ^ — — + sm ^ ' 



M/8~l 



fc=0 



Applying [1741 eq. (1.341.3)] twice yields 

V- . (2i + l)7r 
sm 



M 



M 

^l + 8A;)7r 
M 

37r 



— 2 sin 



M 

(3 + 8A;)7r 
M 



(77) 



.. , 47r 
2 cos — — 2 cos — CSC — 



TT 



M 



TT 27r 
-2sm — sec— , 
M M 



M 



M 



(78) 



where esc x = 1 / sin x is the cosecant function and sec x = 1 / sin x is the secant. 

Expanding the first term of (1761) by the same method as in (1771) reveals that this term is zero. 
Now the resuh follows from and (l78l) . 
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D. Proof for the FBC 

By construction, the first bit of the FBC is the same as for the BRGC and the other bits are 
symmetric around M/2. Therefore, the components in the second dimension of || XlieXfco^«P 
are zero for k = 1, . . . , M — 1 and (1721) can be expressed as 



. Bi _ 41og2e 
M2 

41og2e 
M2 



m— 1 



sin^(7r/M) 



1 

sin2(7r/M) 



EE 

k=i \ieXk,o 

m-l / 

^E E 



COS 



{2i + 1)tt 
M 



cos 



(2z + l)7r 
M 



(79) 



where X^^ = {z G Xfc,o : i < M/2}. 
The indexes X^q of the FBC for k 



m — 1 are obtained as the indexes of the NBC 
of order m — 1. For example, for M = 32, we obtain X{^q = {0,1,2,3,4,5,6,7}, X^q = 
{0, 1, 2, 3, 8, 9, 10, 11}, XgUp = {0, 1, 4, 5, 8, 9, 12, 13}, and X^q = {0, 2, 4, 6, 8, 10, 12, 14}. This 
regularity results in a simplified expression of the inner sum in (1791) . i.e., 

E-^-^^^-E E^^^f^p-'^'^+s'+n) (80) 



j=0 1=0 

tan(7r/2^'+i) 



(81) 



2sin(7r/M) ' 

where the final result was obtained by using I|74l eq. (1.341.3)] twice in (|80l) . after some algebraic 
manipulation. Using (|8T1) in (1791) gives the desired result. 
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